Object-Assisted Question Featurization and Multi-CNN Image Feature Fusion for Visual Question Answering

نویسندگان

چکیده

Visual question answering (VQA) demands a meticulous and concurrent proficiency in image interpretation natural language understanding to correctly answer the about an image. The existing VQA solutions either focus only on improving joint multi-modal embedding or fine-tuning of visual through attention. This research, contrast current trend, investigates feasibility object-assisted strategy titled semantic object ranking (SOR) framework for VQA. proposed system refines representation with help detected objects. For multi-CNN representation, employs canonical correlation analysis (CCA). suggested model is assessed using accuracy WUPS measures DAQUAR dataset. On dataset, analytical outcomes reveal that presented outperforms prior state-of-the-art by significant factor. In addition quantitative analysis, proper illustrations are supplied observe reasons performance improvement.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image-Question-Linguistic Co-Attention for Visual Question Answering

Our project focuses on VQA: Visual Question Answering [1], specifically, answering multiple choice questions about a given image. We start by building MultiLayer Perceptron (MLP) model with question-grouped training and softmax loss. GloVe embedding and ResNet image features are used. We are able to achieve near state-of-the-art accuracy with this model. Then we add image-question coattention [...

متن کامل

Hierarchical Question-Image Co-Attention for Visual Question Answering

A number of recent works have proposed attention models for Visual Question Answering (VQA) that generate spatial maps highlighting image regions relevant to answering the question. In this paper, we argue that in addition to modeling “where to look” or visual attention, it is equally important to model “what words to listen to” or question attention. We present a novel co-attention model for V...

متن کامل

Multi-Dimensional Feature Merger for Question Answering

In this paper, we introduce new features for question-answering systems. These features are inspired by the fact that justification of the correct answer (out of many candidate answers) may be present in multiple passages. Our features attempt to combine evidence from multiple passages retrieved for a candidate answer. We present results on two data-sets: Jeopardy! and Doctor’s Dilemma. In both...

متن کامل

Investigating Embedded Question Reuse in Question Answering

The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...

متن کامل

Generalized Hadamard-Product Fusion Operators for Visual Question Answering

We propose a generalized class of multimodal fusion operators for the task of visual question answering (VQA). We identify generalizations of existing multimodal fusion operators based on the Hadamard product, and show that specific nontrivial instantiations of this generalized fusion operator exhibit superior performance in terms of OpenEnded accuracy on the VQA task. In particular, we introdu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Intelligent Information Technologies

سال: 2023

ISSN: ['1548-3657', '1548-3665']

DOI: https://doi.org/10.4018/ijiit.318671